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As psychologists bacons store Involved In blotsedlesl research Issues, they are Increasingly confronted by an army of 
Methods and tents which are relatively nsw to than, methods and terms which lie within tha province of such disciplines as 
epidemiology and biostatistics. The exposure to unfamiliar terms and methods can result In uncertainty and Halt the effec¬ 
tiveness of psychologists in biomedical research. 


tihi ective 
UTh. , 


■' The objective of this study was to examine the methods commonly employed in epidenlologic and biomedical research and 
to relate them to the methods traditionally used In the field of psychology. The Intent of this report Is to describe some 
of these methods and explain them in a clear and concise fashion. 

Approach / 

The study was conducted In two discrete phases. In the first phase, research reports and journal articles were surveyed 
to ascertain the range of methods used by epldaaiologlsts and medical researchers. These methods were placed Into two cate¬ 
gories: The first category Included those statistical methods and concepts cosmonly used by psychologists; the second cate- 

/ 

gory Included methods and concepts which'appeared to be unique to epidemiology and biostatistics. 

In the second phase, several epidemiologic and bloetatlstlcs texts were consulted and used in an attempt to explain the 
principles behind the methods And concepts placed In the second category. Examples which displayed the potential applica¬ 
tions of these methods elan were selected. 

Results 

The survey revealed many similarities In statistical methods which are quits familiar to psychologists. Hypotheses are 
formulated and tested in much the same manner and chi-square, regression, correlation, and analyses of variance are commonly 
employed In studies of morbidity and mortality. 

It also was found that epidemiologic studios employ rates and measures which, although seldom seen in psychology, are 
based an statistical concepts and principles underlying the methods developed by psychologists. Rates such as tha standard¬ 
ised mortality ratio and incidence and prevalence rates are measures of probability. Measures of association such as the 
relative risk and phi coefficient are grounded m the comparison between observed and expected frequencies on which the chl- 
Squars teat employed by peychologlsts Is based. 


It la concluded that, despite the differences In terminology and frequent use of rates which are not found in psychol¬ 
ogy, the gap between bloetatlstlcs and psychological statistics Is' neither large nor complex . Bees nee tha methods of epi¬ 
demiology, bloetatlstlcs, and psychology ere based am co s mos statistical principles, relatively fm drifts In statistical 
thinking are regained, other than am understanding of the terminology employed, for psychologists «a attain a basic oanpre 
head oa of apt dedal ngtc fled legs. 
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An Epidemiology Primer: Bridging the Gap 
between Epidemiology and Psychology 




Psychologists in recent years have begun to take an active role in health-related research: to pose research questions, 
raise methodological issues, and furnish answers to biomedical problems. In this new role they are likely to experience 
some degree of confusion and frustration, familiar ground may be obscured by new labels or subtle changes in landscape. 
Areas formerly traversed with confidence may now cause uncertainty or trepidation. It may be felt that biostat isl ics repre¬ 
sents an entirely new set of concepts and methods that requires extensive M re-tooling” for understanding and application. 

How readily can psychologist s familiarize themselves with this "new ground"? What shifts in statistical thinking are neces¬ 
sary to attain a basic comprehension of epidemiologic findings? The objective of this paper is to answer these questions by 
examining statistical techniques commonly used in epidemiologic studies that are generally unfamiliar to psychologists and, 
thus, to some extent bridge the communication gap between these disciplines. 

To begin, it must be recognized that all applications, whether in the biological and health sciences or psychology, 
derive from the same general theory of statistics. Therefore, biostatisticians and psychologists are taught the same basic 
statistical concepts and methods. The existence of a common background can be shown by a comparison of textbooks in these 
fields which reveals very similar tables of contents including chapters on descriptive statistics, probability, the binomial 
distribution, the normal distribution, estimation, hypothesis testing, chi-square tests, correlation and linear regression, 
analysis of variance, and nonparamet ric methods. Epidemiologists and biostatisticians also are taught the special tools of 
demography and vital statistics (morbidity and mortality rates and ratios) while psychologists typically learn factor analy¬ 
sis and psychological scaling techniques. Tor the psychologist, the area of vital statistics is undoubtedly very puzzling 
because of the unfamiliar terminology and the unique epidemiologic perspective. Therefore, our discussion will begin with a 
brief description of the typical approaches taken in epidemiologic studies, specifically the nature of cross-sectional, 
retrospective, and prospective studies and the case-control and cohort methods. From this perspective, we will consider 
rates and ratios commonly employed, methods used to adjust rates so that diverse populations can be compared, and measures 
of association among variables that affect morbidity and mortality. 

EPIDEMIt,T,QGIC APPROACHES TO THE STUDY OF ILIUESS 

MacMahon, Pugh, and Ipsen (1960) divide epidemiologic investigations into four separate categories: 

(1) Descriptive Epidemiology . Descriptive epidemiology is concerned with distributions of disease and comparisons of 
different populations or different segments of the same population on morbidity and mortality indices. 

(2) Formulation of Hypotheses . Tentative explanations of observed disease distributions are attempted in terms of 
possible causal associations of a direct nature. 

(3) Analytic epidemiology . This branch of epidemiology consists of observational studies designed specifically to 
ocamlne and test hypotheses developed from descriptive studies. 

(4) Experimental Epidemiology . Experimental studies are conducted on hianan populations to confirm in a rigorous manner 
hypotheses that stand the teet of observational analytic studies. 

It seems apparent that categories (2) and (3) refer to methods of statistical inference and could readily be combined. 
We trill treat both of these categories under the heading of analytic epldamiology where the epidemiologist or biostatisti¬ 
cian attempts to derive inductions, generalisations, or conclusions about questions he has posed by following accepted rules 
of evidence. (The psychologist, of course, follows the same body of rules for arriving at conclusions also extending beyond 
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the immediate data.) lx perimental studies in epidemiology, that is, random assignment of individuals to exposure situations 
or treatments, are rarely possible because of practical and ethical constraints and will not be considered further here. 

This pa per will be concerned only with descriptive and analytic epidemiology. 

DKSCRI tTIVC mbttUOUlGY 

Descriptive statistics in epidemiology involves the use of standardized indices that reflect typical or usual values, 
the amount of variability in sets of observations, and relationships among variables of interest. Thus, descriptive epide¬ 
miology provides methods for organizing, summarizing, and communicating study results. Descriptive epidemiology is not con¬ 
cerned with the causal implications or conclusions that might he drawn from sets of data; such inferences are the province of 
a nalyt ic epidemiology. 

The most commonly used descriptive variables in epidemiology pertain to time, person, and place. According to MacMahon 
et al. (1960), ’'Statements of the frequency of a given trait or disease manifestation in various populations are essential 
to descriptive epidemiology. Such statements permit comparisons between populations and between subgroups of a population 
with respect to the manifestation in question" (p. 51). To control for differences in population or sample size, frequencies 
must be expressed in the form of rates. 

The calculation of rates is a simple procedure performed frequently in both epidemiological and psychological research. 

It is a statement of probability expressed in a quantity or degree of the phenomenon measured per unit of population per unit 
of time. What is known in epidemiology as a specific rate is known in probability theory as a conditional probability. 

In epidemiology, this statement of probability involves three different items of information: (1) the number of persons 
affected by a particular illness, expressed as the numerator, (2) the population within which the affected persons are 
observed, expressed as the denominator, and (3) a specification of the time interval. 

ANALYTIC EPIDEMIOLOGY 

On the basis of a comparison of these rates in different populations or subgroups of a population, tentative hypotheses 
are formulated which posit causal connections between the observed distributions of disease and one or more variables or 
characteristics of the population. These hypotheses are then tested by specially designed observational studies. 

Analytic studies are typically conducted to determine whether or not an association is present between a certain charac¬ 
teristic or combination of characteristics and a disease in a group of afflicted individuals. In these studies, comparisons 
are made between a group of persons who have the disease and a group that does not. The methods employed in these studies 
"depend upon observing (hence the term ’observational’ studies) and quantifying whatever is being studied" (Ibrahim & 

Spitzer, 1979, p. 139). An example of a study in psychology of this type would be the comparison of IQs among students of 
different ethnic groups to determine if an association exists between race and intelligence. 

Analytic studies usually fall within one of two broad categories: case-control and cohort. Both of these are referred 
to in the research literature by various names, thereby creating some confusion over the nature of their differences 
(feinstein, 1979; Ibrahim & Spitzer, 1979; Lilienfeld, 1976; MacMahon et al., 1960). In the case-control method, affected 
(cases) and nonaffected (controls) groups are compared to determine whether a particular characteristic occurs with greater 
frequency among those affected by a certain disease or illness. There are two major forms of case-control study, retro¬ 
spective and cross-sectional. In the retrospective study, the objective is to establish if the characteristic was present in 
the past. The investigator looks backward in time for exposure. In the cross-sectional study, the characteristic being com¬ 
pared is present in both cases and controls at the time of the investigation. In both types of study, analysis proceeds from 
effect to cause. 

The second category of analytic study employs the cohort method. With this design, a particular population is examined 
to determine if the characteristic that may be related to the disease being investigated is present in significant quantities. 
The researcher lodes forward In time for exposure and analysis proceeds from causa to offset. In this approach, also refers 
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red to as prospective study, the population may he followed for several years to certify which members develop or die from 
the disease. 

There are numerous advantages and disadvantages with either research approach. Prospective studies enable the researcher 
to obtain direct estimates of the risk associated with a suspected causal factor and to reduce the probability of spurious 
relationships resulting from bias in data collecting procedures, but they also generally are laborious, time-consuming, and 
expensive {MacMahon, et al., 19(i0, p. 4?). The retrospective study, on the other hand, is relatively quick and inexpensive, 
easily repeatable, and can economically examine a large number of cases. However, it also is subject to selection, infor¬ 
mation, and confounding biases O'einstein, 1979: Ibrahim K Spitzer, 1979). 

Whatever their relative merits and disadvantages, the objective of both types of study is to determine whether or not a 
relationship exists between a particular trait or set of traits and a specific illness or disease. The comparison, in its 
simplist, dichotomous form, is usually represented in a 2x2 table, as shown in Table 1. If a higher proportion of individuals 
with the characteristic is found among the cases than the controls, an association between the disease and the characteristic 
is indicated. 


Table 1 

framework for the Study of Disease 

Number of Individua1s 


With Without 

Characterist ic Disease Disease 

With a b 

Without c d 

Total a+c b+d 


STATISTICAL METHODS FOR DESCRIPTIVE EPIDEMIOLOGY 


Total 
a+b = Nj 
c+d N 2 
a+b+c+d - N 
N 


Calculation of Kates 

The description of a particular illness may utilize one or both of two types of rates, mortality and morbidity. To com¬ 
pute a mortality or death rate, the following specific information is needed: (1) In the numerator is included the number of 
deaths in the exposed or affected population during a certain time period. (2) In the denominator is the total population 
group exposed to the risk of death. (3) A time factor, usually a 1-year interval, is specified. The annual death rate can 
be calculated with this information. 


Annual death rate (ADR) Total number of deaths during a specified period 

from all causes (per = of a year x 1,000 

1,000 population) Number of 1 persons in the population at mid-year 


Thus, if 1,200 deaths occurred in a population of 1,000,000 in 1980, the annual death rate would be: 


Annual death rate (ADR) 1,200 deaths in 1980 x 1,000 

in 1980 (per 1,000 = 1,000,000 persons present as of July 1980 =1.2 per 1,000 population, 

population: 

The units of time and population may be selected by the investigator to suit his own purposes, but they must be speci¬ 
fied. Death rates also can be made specific for a variety of charact eri at lea, such as age, cause of death, marital status, 
race, and occupation. 


ADR from all causes Nwber of doaths of persons 18-24 during 

for persons 18-24 (per 3 a period of 1 year X 1,000 

1,000 population) Number of 18-M yeor olds in blfa popula- 

tlon at mld-yoar 
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ADR from lung cancer 
(per 1,000 population! 


X 1,000 




Number of deaths from lung cancer 

per year __ 

Number of persons in tTie popula- 
tion at mid-year 


Another type of rate frequently used is the "case fatality rate”: 

Number of individuals dying during a specified 
Case fatality rate (',! = period of time afHer onset or diagnosis of disease X 100 

Number oi 1 ^ individuals with the specified disease 


This rate represents the risk <>f dying during a definite period of time for those individuals who have the particular dis¬ 
ease. As with the death rate, the period of time during which the deaths occurred should be indicated. Case fatality rates 
also can be made specific for age, sex, severity of disease, and any other factors of clinical and epidemiological importance. 
The third mortality rate used in epidemiologic research is the proportion of total deaths due to a specific cause: 

Proport ionate mortality rate Number of deaths from cardiovascular diseases 

from cardiovascular diseases in the U.S. Navy in 1970 _ X 100 

in the U.S. Navy in 1970 Total deaths in the U.S. Navy in 1970 

However, since this rate depends on two variables, it is of limited value in making comparisons between different populations 
or time periods. It also fails to directly measure the risk or probability of a person in a population dying from a specific 
disease as does a cause specific mortality rate. 

One of the most frequently used morbidity rates is the incidence rate which is defined as the number of new cases of a 
disease that occur during a specified period within a specified unit of population. 

Number of new cases of a disease occurring 

Incidence rate per 1,000 = in a population during a specified period of time X 1,000 

Number of persons exposed to risk of developing the 
disease during that period of time 

Another morbidity rate is the prevalence rate, which measures the number of cases that are present at, or during, a specified 
period of time. The prevalence rate equals the incidence rates times the average duration of the disease. Tor example, if 
the average duration of hypertension is three years and its incidence rate is 15 per 1,000, the prevalence rate would be 45 
per 1,000. 

Number of cases of disease present in 

Prevalence rate per 1,000 = the population at a specified time X 1,000 

Number of persons in the population 
at that specified time 



' 


The two types of prevalence rates which are used by investigators are point prevalence and period prevalence. Point 
prevalence refers to the number of cases present at a specified moment in time; period prevalence refers to the number of 
cases that occur during a specified period of time, for example, a year. Period prevalence consists of the point prevalence 
at the beginning of a specified period of time plus all new cases that occur during that period. 

All forms of morbidity rates, including incidence and prevalence rates can be made specific for age, sex, and/or any 
other personal characteristics. They also can be standardized in the same manner as mortality rates. 

AGE ADJUSTMENT FOR MORTALITY RATES 

The population characteristic that has the greatest influence on mortality rate is the age of the members. Since dif¬ 
ferences in the age composition of a population will influence the total mortality rates, it is preferable to use age spe¬ 
cific mortality rates in compering the mortality experiences in two different geographical areas, population groups, or time 
periods. To control for differences in the age distribution of a population, two different summary statistics may be 
employed: the direct method of age adjustment and the standardized mortality ratio. Both rely upon a comparison of expected 
rate* of a standard or control group with the observed rates of the population under study. 
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Direct Method of Age Adjustment . The basic procedure for this method is to apply the age-specific mortality rates for 
the two groups that are being compared to the number in the same ag* groups of the standard population. For most studies con¬ 
ducted in the United States, the standard population is the population of the U.S. as determined in the 1940 census. This 


procedure gives the number of deaths that can be expected in the standard population if these age-specific rates from the 
observed populations had prevailed in the standard population. An example of the use of this method to adjust the calcula¬ 
tion of a mortality rate to control for age is found in Table 2. 


Table 2 

Calculation of the Age-Adjusted Mortality Rates from All Causes 
by the Direct Method: United States, 1950 and I960 3 


Expected Number 


Mortality from All Standard Population: of Deaths that Would Occur 

Causes per 100,000 Total U.S. Enumerated in Standard Population at 


Age 

Fopulat 

ion_ 

Population per 1,000,000 

Rates in 


Group 

(Yea rs ) 

1950 

1960 

1940 

1950 

1960 


— 

■ ■ ■ 





(1) 

(2> 

(3) 

(l)x(3> 

mx(3) 

< 1 

3,299.2 

2,696.4 

15,343 

506.2 

413.7 

1-4 

139.4 

109.1 

64,718 

90.2 

70.6 

5-14 

60.1 

46.6 

170,355 

102.4 

79.4 

15-24 

128,1 

106.3 

181,677 

232.7 

193.1 

25-34 

178.7 

146.4 

162,066 

289.6 

237.6 

35-44 

358.7 

299.4 

139,237 

499.4 

416.9 

45-54 

853.9 

756.0 

117,811 

1,006.0 

890.7 

55-64 

1,901.0 

1,735.1 

80,294 

1,526.4 

1,393.2 

65-74 

4,104.3 

3,822.1 

48,426 

1,987.5 

1,850.9 

75-84 

9,331.1 

8,745.2 

17,303 

1,614.6 

1,513.2 

85+ 

20,196.9 

19,857.5 

2,770 

559.5 

550.4 

Total death rate 

all ages 

963.8 

954.7 

— 

— 

— 

Total population 

— 

- 

1,000,000 

- 

— 

Total expected 

number of deaths 

— 

— 

— 

8,414.5 

7,609.7 

Age-adjusted death 

rate per 100,000 

—■ 



841.45 

760.97 

a Source: Klebba, Mauer, and Glass 

In Table 2, the age-adjusted rate 

(1973) 

in 1960 is 

much lower than in 1950, in contrast 

to the total death 

rates where the 


1960 death rate is only slightly lower. The difference between the changes in the total and age-adjusted death rates results 
from the fact that the 1960 population has a larger proportion of people in the older age groups than the 1950 population. 

The total death rate is affected by both the age-specific death rates and the age distribution of the population. The age 
adjustment procedure is used to remove the influence of the age distribution of the population by use of a standard popula¬ 
tion. 


Standardized Mortality Ratio (SMR). A second method of age adjustment is a statistic widely used in studies of occupa¬ 
tional mortality. It is defined as the number of deaths, either total or cause-specific, in a given occupational group 
expressed as a percentage of the number of deaths that would have been expected in that occupational group j_f the age- and 
sex-specific rates in the general population were applicable. The statistic is calculated by using the formula: 


Standard Mortality Ratio (SNR) - Observed m—bar of deathe per year X 100 = % 

Expected number of deaths per year 

The ocpected number of deathe per yeer of a particular sample is calculated by using the equation H = £ab where 
a = the number of sample masters belonging to a particular age group 
b * the standard drath rate in the general population for that same age group. 





i 

i 


* 
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An example of this procedure is found in Lilienfeld (1976). between 1949 and 1953, there were 7,320 deaths among male 
farmers and farm managers in England, or an average of 1,464 deaths per year. In determining whether this figure indicates 
a normal, high, or low mortality risk for this group, a standardized mortality ratio is calculated using the standard death 


rates in Ingland (see Table 3 >, 


Table 3 

Calculation of the Standardized Mortality Ratio for 
Decufiat ion of Male farmers and farm Managers for All Causes of heath: 

1 '*51 a 


Age 


21)-:* 4 
25-34 

35-44 

45-54 

55-64 


Number of farmers 
and farm Managers 
(Census, 19511 

_ill_ 

7,989 

37,030 

60,838 

68,687 

55,565 


Total expected deaths per year: 2,083 
Total observed deaths per y^r: 1,464 


SMR 


Standard Death Rales 
per l, 000,000 (All 
Causes of Death) 

_ L2J__ 

1,383 
1,594 
2,868 
8,212 
22,953 


1,464 

2,083 X 100 = 70.3/'. 


a Source: Registrar General's Decennial Supplement (1958). 


expected Number of 
Deaths for farmers 
and farm Managers 
per 1,000,000 

(3) (I) X (2) 

11 

59 

174 

564 

1,275 


Table 3 gives the results of such a calculation. The SMR indicates that the mortality experience of farmers and farm 
managers was only 70.3% of the total population rate from all causes of death. 

Although the SMR is a widely used statistic in epidemiology, it also possesses certain limitations. Wong (1977), for 
instance, notes that the comparison of SMRs is questionable. Gaffey (1976) cites three specific limitations to the SMR: 

(1) the lack of a relationship between the SMR and the life expectancy of a particular population; (2) the unequal sizes of 
the SMR and the relative risk, the discrepancy depending on the age of the study population, and (3) at older ages, the SMR 
is subject to limitations on its possible values, more or less independently of any hazard to which the study population may 
be exposed. Based upon these limitations, Gaffey recommends ugain-t the use of the SMR as an estimate of relative risk, 
believing that the SMR in general will be a biased estimate of that relative risk and its bias will be different with each 
age group. However, Symons and Taulbee (1981) state that the SMR can be a useful approximation of relative risk when (1) the 
age-specific rates in the comparison population for the cause(s) of interest are no larger than about 100 per 10,000 subjects 
per year, (2) the age bands are not too broad, and (3) the age-specific mortality rates for the study and comparison popula¬ 
tions are in approximately constant ratio across the age bands. 

MEASURES OF ASSOCIATION 

In both retrospective and prospective studies, the object of research is to determine whether or not a correlation can 
be established between a specific characteristic and the dfsease being examined. Psychologists have long employed statisti¬ 
cal methods, such as analyses of variance and regression analyses, to measure the strength or degree of association between 
the variables. In some instances, epldoniologlsts employ similar methods; other instances require the use of methods not 
found in psychology. In this paper, four specific methods commonly used in biostatistics which examine the relationship or 
association between two or more variables are reviewed: Poisson distributions, relative risk, measures based on chi-squares, 
and attributable risk. 

Poisson Distributions . Hypothesis testing involves the use of a sampling distribution in which probabilities (expected 
frequencies) are compered with outcome (observed frequencies). A distribution of probabilities indicates the likelihood of 
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each of the observed frequencies if the assumptions made regarding the phenomenon under study are actually correct. There 
are three probability distributions commonly employee! in statistical analyses: the norma 1 distribution, binomial dist ribu- 
tion, and Poisson distribution. Psychologists are familiar with the first two; the third, however, is more commonly found 
in biomedical research. Although the Poisson distribution is often used as an approximation of the binomial distributions, 
it may provide a useful distribution in its own right. 

The Poisson distribution is used when events or entities are random and independent of each other and when the probabil¬ 
ity of an event is very small and, even if the sample size is large, only a small number of events are observed. These dis¬ 
tributions are likely to be obtained when the observations consist of counts such as the number of cases of a particular ill¬ 
ness over a fixed peri oil of t ime. 

The Poisson distribution consists of a distribution of random variables taking values of 0, 1, 2,.... If the variable 
X has a particular value, then the probability from the Poisson distribution that this value will occur is 

e~H p x 

xT 

The quantity M e” in this formula is a constant with a value approximately equal to 2.71828. Tor variables that have a 
Poisson distribution, the mean and the variance are equal, that is to say, p - o. in contrast to the binomial distribution 
which requires a knowledge of n and p, the Poisson distribution requires only a knowledge of the distribution mean, or p, 
which can take any value greater than zero. 

Remington and Schork f19701 present two general models which demonstrate the utility of the Poisson distribution. The 
first is characterized by a large quantity of some medium such as sea water or air in which are found a large number of small, 
discrete entities, such as plankton or bacteria. One of the most important traits of this model is that there is a uniform 
density of the small entities throughout the medium. When a small quantity of this medium is examined, the probability that 
this sample will contain x number of entities is the Poisson probability. 

The second model producing Poisson probabilities concerns events occurring in time. Such an event would include members 
of a community who contract a particular illness or disease. If the events occur independently, the probability that an 
event will occur in a short-time interval is proportional to the length of the interval, and the time interval is short 
enough such that the probability of more than one event occurring in such a time interval is negligible, then the probability 
that x events occur in a fixed time interval is the Poisson probability. 

Relative Risk . The most common measure of association in retrospective studies is relative risk which reflects the 
incidence of disease among a group possessing a certain characteristic relative to a group without the characteristic. The 
measure indicates the likelihood a member of a specified population will acquire and/or succumb to a disease if he possesses 
the characteristic under study. Thus, a study which determines that the relative risk of lung cancer for cigarette smokers 
is 3.3 is stating that the risk of contracting lung cancer is 3.3 times greater for smokers than for nonsmokers. 

Relative risk is calculated from a 2x2 table in which the number of cases and controls are compared with respect to the 
presence or absence of a particular characteristic (see Table 1). The cross products are then multiplied and divided, pro¬ 
ducing the following equation: 

ad 

RR = Ec 

This equation gives an approximation of relative risk and assuaes that (1) the cases and controls have been selected at ran¬ 
dom and are representative of the larger population, and (2) the frequency of the disease in a population is relatively small. 

ad 

If RR is equal to 1 or unity, then tic as an approximation of relative risk is exact. This equation is the one used most 
often in calculating relative risk. 

If the frequency of disease in a population is large or the approximation of RR proves to be inadequate, i.e., In cases 
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where there are multiple categories ot‘ groups—di fferent subgroups by age or oerupnt ion—under study, there is a more accu¬ 
rate measure developed by Mantel and Haenszel (1959). The revised relative risk is calculated as follows: 


,(l u 


(ill) 

N 


In addition. Mantel and Haens/ol have calculated summary relative risk equations for separate subcategories of exposure. The 
rationale for these equations that "...over-all relative risk estimates are averages and ns averages may conceal substan¬ 
tial variation in the magnitudes of the relative risk among subgroups. Ordinarily, i he individual subcategory data should be 
examined, jviying special attention to relative risks based on reasonably large sample sizes. This will provide protection 
against the potential deficiencies of any particular summary relative risk formula employed" (Mantel K Haenszel, 1959, •». 
“40i. An example of such a summary relative risk formula is the following: 


ad 

RR be 

rYaVrlc]') 

where: 


i:(;i) = 

N,Mj 


N 

r.0>i 

n,m 2 


N 

Lfc't - 

N 2 M l 


N 

rfdl = 

n 2 h 2 


N 

A simpler method of calculating relative risk for multiple categories, however, is to prepare a series of 2x2 tables 
comparing controls and cases at different levels of exposure and then to compute the relative risk for each table. An 
example of such a procedure is found in l.ilienfeld (1976) (see Table 41. 

Age adjustment procedures are also important when calculating relative risk. One such procedure is the matching case 
method in which a sample of N diseased individuals is drawn and the characteristics of each individual noted with respect to 
the control factors. Subsequent ly, a sample of N well individuals is drawn, with each individual matched on the control 
factors to one of the diseased individuals. In applying such a procedure, the 2x2 table takes on a different form from that 
shown in Table 4. The cell in Table S in the upper left-hand comer contains r number of pairs in which both cases and con¬ 
trols possess the characteristic of interest. The marginal totals (a,b,c,d) represent the entries in the cells of Table L 
and the total for the entire table is l 2 N pairs where N represents the total number of paired individuals The calculation of 
the relative risk For this table would be: 

RR ^ f (P rovided * f O'* 

Table 4 

Relative Risk for Smokers and Nonsmokers 


Example of Calculating Relative Risk for Multiple Categories 


Dally Average 
igarettes Smoked 

Lung Cancer 

Patient 8 

Controls 

Relative Risk of Different 
Categories of Smokers 
to Nonsinokers 

0 

7 


61 

1.0 

1-4 

55 


129 

3.7 

5-14 

489 


570 

7.5 

15-24 

475 


431 

9.6 

25-39 

293 


154 

16.6 

50+ 

38 


12 

27.6 
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The different degrees <»r levels of cigarette smoking are to be compared with the nonsmokers, arid, therefore, the relative 
risks of lung cancer for nonsmokers is taken to be l.n. The risks for smokers compared to riunsiiwker are: 


h'K M— i cigarettes daily) 

RK (5-14 cign rett es dai1y l 
RR ( 1 5-24 c i ga rei t es da i 1 y t 



x 431 


Table :■ 

Model of falculation of Relative Risk for Matched Cases 
an<l Controls With and Without a (haract erist ic 


Cases 

With characteristic 
Without characteristic 
Total 


Cont rol s 

With Without 

Cha ract eri stir (ha met eri si i c Tot a ] 

r 
1 

b* 



*a,b,c, and d are the entries in the cells in Table l. 


A test of whether or not the observed difference between ad and be ts due to sampling variation is provided by a chi- 
square test for 2x2 tables. Mantel/Haenszel have developed a chi-square formula specifically for use in testing the signifi¬ 
cance of a relative risk correlation. 


* 2 mh 

l-M 2 


EVfa) 

where 

a a) 

. N,M, 



N 

and 

V(a ) 

r NjMjNjM, 


N‘fN-1 ) 


If X 2 > 3.84, one may conclude that it is unlikely that the difference in risk between the group with and the group without 
the characteristic is a result of chance. 

When testing for significance in a matched pairs example such as in Table 5, the McNemar test where: 


X 2 = (lt-sl-1) 2 with 1 df 

t+S 

may be employed. 

In establishing confidence limits for the test of significance, confidence limits of the logarithm (to the base el of a 
corrected relative risk are computed and the logarithmic confidence levels are then reconverted to the original scale. The 
addition of 0.5 to the numbers a,b,c, and d corrects for a bias which can occur with small numbers of observations. Using the 
log-relative risk rather than the relative risk itself simplifies calculations of standard errors necessary for computing con¬ 
fidence intervals. 

Using the Chi-Square Test . The chi-square, a statistical tool familiar to most psychologists, has two basic uses in 
epidemiologic research. First, the chi-square test may be used to evaluate whether or not frequencies which have been 
empirically obtained differ significantly from those which would be expected under a certain set of theoretical assumptions, 
that is, testing the null hypothesis. Second, the chi-square test may be l ed in determining the degree or strength of an 
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association. As with the standardized mortality ratio, the chi-square is based upon a comparison of observed and expected 


frequencies. Chi-square is obtained by taking the square of the difference between the observed ano expected frequencies in 
each cell divided by the expected number of cases in each cell: 


X* = when all observed and expect ed frequencies are identical. If f^-f^ *>, i hen t lie null hypothesis is confirmed. The 

greater the difference between observed and expected frequencies, the greater the chance the null hypothesis is rejected. 
The chi-square for a 2x2 table may be calculated using tlie formula 


( ( I ad-be I -N'/'J I'-.V 

X 2 - F. “ fa+b) H)+cV ra*ci (d+b) 


In the case of a 2x2 table where any of the expected frequencies are 5 or less, a correction for continuity, known as the 
Yates correction, can he made by adding or nbtracting t>.S from the observed frequencies in order to reduce the magnitude of 
the chi-square. The usefulness of this modification, however, has been subject to debate fcf., Ileiss, 1973; Mantel and 
Greenhouse, 1%8; Grizzle, 1067; Remington X Schork, 1970). 

A common fallacy in employing the chi-square test is to use the value of chi-square itself as a measure of the degree to 

which a disease and a characteristic are associated with one another. Fven though chi-square is excellent as a measure of 

the significance of an associat ion , it does not indicate the degree of association because it is a function both of the prop¬ 
erties of the various cells and the total number of subjects studied. The degree of association present is realty only a 

function of the cell proportion, which explains why relative risk is used as a measure of association. 

There are, however, measures based upon the chi-square test which do provide a measure of the degree of the association 
between an illness and a specific characteristic. One such measure is the phi coefficient. The phi coefficient or 0 gives a 
numerical value, ranging from 0 to +1 for a relationship between two variables and is similar in meaning to a correlation co¬ 
efficient. It is calculated by using the following formula: 

(ad-bc) (ad-bc) 

9 J fa +b) (a+c 1 ) fb<c) (b+d) = X 2 

Another measure is V or Cramer's measure and is calculated by using the formula: 

X 2 0 

V" = NMin fr-i,c-l) Min (r-1 ,c-l) 

where Min (r-l,c-l) refers to either r-1 or c-1, whichever is the smaller. 

Another measure is the Pearson's contingency coefficient where: 

X 2 

c = X 2 + N 

Attributable Risk . A fourth measure of the association between a disease and a particular characteristic is "attribu¬ 
table risk." The measure was initially defined in terms of lung cancer and smoking as the maximum proportion of lung cancer 
attributable to cigarette smoking (Levin, 1953). It is expressed as: 

AR - b(r-l)+l 

where r - relative risk of lung cancer among cigarette smokers as compared to nonsmokers, and b = proportion of the total 
population classified as cigarette smokers. 

The effect of relative risk (r) and the proportion of those with a characteristic in the population (b) on the values of 
the attributable risk are shown by calculations of the attributable risk for different values of r and b in Table 6. Thus, 
n . ..when the frequency of a characteristic, such as cigarette smoking is low and the relative risk for a disease among eiga- 
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rette smokers is also low only n small proportion of' the oases of* disease ran be attributed to cigarette smoking" /Mlienfeld, 
1*17(1, p. lsM. The reverse is t rue when relative risk and the proport i tin of smokers is high. Attributable risk, therefore, 
allows one to estimate the extent to which a particular disease is due to a specific factor. 

Table o 

At t ri but able Risks as a Proportion for Selected Values 


of Relative Risk 

an«l Pro port ion of 1* 

ipulation 




with 

t he Cha raet eri st i c 

r Relative 

Risk 


A 

h r Pro port ion of Population 
\< i t h Charact eri st i o ( percent ) 

o 

4 

ill 

11 


10 

.99 

.23 

47 

.52 


30 

.23 

.47 

73 

. 77 


Ao 

.33 

.60 

82 

.84 



.41 

.67 

86 

.89 


on 

.47 

. 73 

X<> 

.91 


9A 

.49 

.74 

90 

.92 

> 


Attributable risk is particularly useful for the study of mortality, for the study of fertility or recurrent diseases, 
however, the measure is limited because the relative risk involved is the ratio of two probabilities fPark, 19H1>. Park pro¬ 
vides a modification of the attributable risk measure that is suitable for recurrent events. 

SUMMARY 

Even a brief survey of epidemiologic and biomedical research reveals the use of statistical methods quite familiar to 
psychologists. Hypotheses are formulated and tested in much the same manner and chi-squares, regression, correlation, and 
analyses of variance are commonly employed in the effort to study the relationships between morbidity, mortality, and numer¬ 
ous other environmental, physiological, social, cultural, and psychological variables. 

There are, however, key statistical concepts widely used in epidemiology and biostatistics but seldom seen in psychology. 
Epidemiologic studies employ the use of rates and measures of association which indicate the degree of a relationship between 
a disease and one or more characteristics, both the rates and the measures of association are based upon statistical con¬ 
cepts and principles underlying the methods utilized by psychologists. Rates, such as the standardized mortality ratio and 
the incidenre rate, are measures of probability in which a group of people is compared with the larger population over a 
specified period of time. Measures of association such as the relative risk and phi coefficient are grounded in the compari¬ 
son between observed and expected frequencies on which the chi-square test employed by psychologists is based. 

Despite the differences In terminology and the free - use of rates in epidemiologic and biomedical research which are 
not found in psychology, this brief review indicates t" t the gap between biostatistics and psychological statistics is 
neither large nor complex. Psychologists should be able to readily familiarize themselves with this new ground as the con¬ 
cepts which underlie the methods of epidemiology and biostatistics are also found in the statistical methods of psychology. 
Other than an understanding of the terminology employed, relatively few shifts in statistical thfnking are necessary to 
attain a basic comprehension of epidemiologic findings. 
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